Using Word-Pair Identifier to Improve Chinese Input System
نویسنده
چکیده
This paper presents a word-pair (WP) identifier that can be used to resolve homonym/segmentation ambiguities and perform syllable-to-word (STW) conversion effectively for improving Chinese input systems. The experiment results show the following: (1) the WP identifier is able to achieve tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.5% and 90.7%, respectively, among the identified word-pairs; (2) while applying the WP identifier, together with the Microsoft input method editor 2003 and an optimized bigram model, the tonal and toneless STW improvements of the two input systems are 27.5%/18.9% and 22.1%/18.8%, respectively.
منابع مشابه
Applying a Mix Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
This paper describes a mix word-pair mix-WP) identifier to resolve homonym/segmentation ambiguities as well as perform STW conversion effectively for Chinese input. The mix-WP identifier includes a specific word-pair (SWP) identifier and a common wordpair (CWP) identifier. It is designed as a supporting processing with Chinese input systems. Our experiments show that by applying the mix-WP iden...
متن کاملApplying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW convers...
متن کاملApplying an NVEF Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
Syllable-to-word (STW) conversion is important in Chinese phonetic input methods and speech recognition. There are two major problems in the STW conversion: (1) resolving the ambiguity caused by homonyms; (2) determining the word segmentation. This paper describes a noun-verb event-frame (NVEF) word identifier that can be used to solve these problems effectively. Our approach includes (a) an NV...
متن کاملWord Sense Disambiguation and Sense-Based NV Event Frame
Word sense is ambiguous in natural language processing (NLP). This phenomenon is particularly keen in cases involving noun-verb (NV) word-pairs. This paper describes a sense-based noun-verb event frame (NVEF) identifier that can be used to disambiguate word sense in Chinese sentences effectively. A knowledge representation system (the NVEF-KR tree) for the NVEF sense-pair identifier is also pro...
متن کاملUsing Word Support Model to Improve Chinese Input System
This paper presents a word support model (WSM). The WSM can effectively perform homophone selection and syllable-word segmentation to improve Chinese input systems. The experimental results show that: (1) the WSM is able to achieve tonal (syllables input with four tones) and toneless (syllables input without four tones) syllable-to-word (STW) accuracies of 99% and 92%, respectively, among the c...
متن کامل